6 research outputs found

    An evaluation of DGA classifiers

    Get PDF
    Domain Generation Algorithms (DGAs) are a popular technique used by contemporary malware for command-and-control (C&C) purposes. Such malware utilizes DGAs to create a set of domain names that, when resolved, provide information necessary to establish a link to a C&C server. Automated discovery of such domain names in real-time DNS traffic is critical for network security as it allows to detect infection, and, in some cases, take countermeasures to disrupt the communication and identify infected machines. Detection of the specific DGA malware family provides the administrator valuable information about the kind of infection and steps that need to be taken. In this paper we compare and evaluate machine learning methods that classify domain names as benign or DGA, and label the latter according to their malware family. Unlike previous work, we select data for test and training sets according to observation time and known seeds. This allows us to assess the robustness of the trained classifiers for detecting domains generated by the same families at a different time or when seeds change. Our study includes tree ensemble models based on human-engineered features and deep neural networks that learn features automatically from domain names. We find that all state-of-the-art classifiers are significantly better at catching domain names from malware families with a time-dependent seed compared to time-invariant DGAs. In addition, when applying the trained classifiers on a day of real traffic, we find that many domain names unjustifiably are flagged as malicious, thereby revealing the shortcomings of relying on a standard whitelist for training a production grade DGA detection system

    Weakly supervised deep learning for the detection of domain generation algorithms

    Get PDF
    Domain generation algorithms (DGAs) have become commonplace in malware that seeks to establish command and control communication between an infected machine and the botmaster. DGAs dynamically and consistently generate large volumes of malicious domain names, only a few of which are registered by the botmaster, within a short time window around their generation time, and subsequently resolved when the malware on the infected machine tries to access them. Deep neural networks that can classify domain names as benign or malicious are of great interest in the real-time defense against DGAs. In contrast with traditional machine learning models, deep networks do not rely on human engineered features. Instead, they can learn features automatically from data, provided that they are supplied with sufficiently large amounts of suitable training data. Obtaining cleanly labeled ground truth data is difficult and time consuming. Heuristically labeled data could potentially provide a source of training data for weakly supervised training of DGA detectors. We propose a set of heuristics for automatically labeling domain names monitored in real traffic, and then train and evaluate classifiers with the proposed heuristically labeled dataset. We show through experiments on a dataset with 50 million domain names that such heuristically labeled data is very useful in practice to improve the predictive accuracy of deep learning-based DGA classifiers, and that these deep neural networks significantly outperform a random forest classifier with human engineered features

    Estimation of tuberculosis incidence at subnational level using three methods to monitor progress towards ending TB in India, 2015–2020

    No full text
    Objectives We verified subnational (state/union territory (UT)/district) claims of achievements in reducing tuberculosis (TB) incidence in 2020 compared with 2015, in India.Design A community-based survey, analysis of programme data and anti-TB drug sales and utilisation data.Setting National TB Elimination Program and private TB treatment settings in 73 districts that had filed a claim to the Central TB Division of India for progress towards TB-free status.Participants Each district was divided into survey units (SU) and one village/ward was randomly selected from each SU. All household members in the selected village were interviewed. Sputum from participants with a history of anti-TB therapy (ATT), those currently experiencing chest symptoms or on ATT were tested using Xpert/Rif/TrueNat. The survey continued until 30 Mycobacterium tuberculosis cases were identified in a district.Outcome measures We calculated a direct estimate of TB incidence based on incident cases identified in the survey. We calculated an under-reporting factor by matching these cases within the TB notification system. The TB notification adjusted for this factor was the estimate by the indirect method. We also calculated TB incidence from drug sale data in the private sector and drug utilisation data in the public sector. We compared the three estimates of TB incidence in 2020 with TB incidence in 2015.Results The estimated direct incidence ranged from 19 (Purba Medinipur, West Bengal) to 1457 (Jaintia Hills, Meghalaya) per 100 000 population. Indirect estimates of incidence ranged between 19 (Diu, Dadra and Nagar Haveli) and 788 (Dumka, Jharkhand) per 100 000 population. The incidence using drug sale data ranged from 19 per 100 000 population in Diu, Dadra and Nagar Haveli to 651 per 100 000 population in Centenary, Maharashtra.Conclusion TB incidence in 1 state, 2 UTs and 35 districts had declined by at least 20% since 2015. Two districts in India were declared TB free in 2020
    corecore